Method and apparatus for compression and de-compression of spectral data

ABSTRACT

A method and apparatus for data compression, particularly applicable to spectral signals such as Fast Fourier Transforms of vibration data. The data is merged to remove redundant frequencies when recorded at multiple sample rates, thresholded with respect to a noise floor to remove even more redundant data, and then the positions of non-zero signal values, with respect to the noise floor, are recorded in a first dataword and the non-zero signal values themselves are all recorded concatenated to form a second dataword. The compressed data set consists of the first and second datawords, together with the value of the noise floor, maximum original amplitude and the broadband power. In the event of successive data sets having the same or similar locations for non-zero signal values a re-use flag may be set and the locations dataword discarded. Preferably the signal values are non-linearly quantized to further reduce the amount of data.

This invention relates to a method and apparatus for compression andde-compression of spectral data.

The requirement for data to be compressed arises in a huge number ofsituations primarily for two reasons:

1) to reduce data storage requirements;

2) to reduce the bandwidth required for transmission of data.

These two factors impose different constraints on data compressiontechniques. The first requires good average compression of a signal,i.e. poor compression on transient events can be mitigated by bettercompression on steady-state operation. The second requires goodconsistent compression: in general the data transmission rate orbandwidth of an available communications medium is rigidly limited andso high transient data rates, where the data compression technique doesnot sufficiently reduce the data rate, requires buffering to be used toallow excess data to be transmitted during periods of better compressionperformance.

There are many known data compression techniques some of which are“lossless” (i.e. perfect reconstruction of the original data ispossible) and some of which are “lossy” (i.e. perfect reconstruction ofthe original data is not possible). However generic data compressiontechniques tend not to be able to achieve the optimal compression ofdata of a particular type or in a particular application. Improved datacompression performance can be achieved if it is based on knowledge ofthe type of data to be compressed and of what aspects of the informationin that data are of importance.

According to the present invention there is provided a method ofcompressing spectral data constituting a representation of a signal froma sensor, comprising the steps of:

thresholding the spectral data with respect to a noise floor to leaveremaining as non-zero values only values above the noise floor;

encoding other spectral data as a first dataword constituting a bitmapin which each bit represents the presence or absence of non-zero valueat each of a plurality of points in the spectrum; and

encoding a second dataword consisting of the nonzero values.

The thresholding of the data with respect to a noise floor allows aconsiderable reduction in the amount of data to be encoded. The encodingof the remaining data in the form of two datawords, in which the firstforms a “map” of the positions of non-zero values allows easyreconstruction of the data. It will be appreciated that the thresholdingwith respect to a noise floor means that the compression technique is“lossy” in that it is not possible to reconstruct the complete originalsignal perfectly. By “non-zero signal value” in this context is meantvalues which are greater than the noise floor.

The invention is particularly suitable for compression of data inmonitoring of equipment that spends significant time in a steady state,for example rotating machinery such as engines and generators.

The spectral data may comprise data obtained by sampling the sensorsignal at a plurality of sampling rates to provide a correspondingplurality of spectral representations each at a different frequencyresolution and each extending to a different maximum frequency in thespectrum, and merging the plurality of spectral representations into asingle spectral representation by retaining only the highest frequencyresolution data for each range of the spectrum. Where a signal issampled into frequency domain data it contains a fixed number of bins ofamplitudes from zero to the Nyquist frequency (50% of the sample rate).Where the sampling rate and thus Nyquist frequency is higher, thefrequency range covered is bigger, so with a fixed number of bins eachbin covers a bigger frequency range, making the resolution lower. If, asis often the case, that signal is sampled into multiple frequency rangesto give a mixture of high frequency, low resolution and also lowfrequency, high resolution data, each will contain copies of some ofthat data (the lower frequency end). Those copies in the lowerresolution spectra can be discarded as redundant. Thus a merged spectralrepresentation can be produced in which the lower frequencies in thespectrum are represented with a finer resolution than higherfrequencies.

Preferably the non-zero signal values are quantised to further compressthe signal, preferably using non-linear quantisation in which thequantisation steps are sized to keep the ratio of value to errorapproximately equal throughout the quantisation range. Thus thequantisation step size is larger for high signal levels and lower forlow signal levels. Preferably the non-linear quantisation is calculatedbetween adaptive upper and lower bounds calculated from the data, thelower bound may be the noise floor and the upper bound the maximumamplitude in the spectral data.

The second dataword preferably comprises the non-zero signal valuesconcatenated together.

The method is applicable to the compression of a series of spectral datasets representing a continually sampled sensor signal, and in this caserespective first datawords of sets in the series can be compared and, ifthey are the same or similar, a “re-use” flag may be set and the firstdataword of the second (and subsequent) data sets in the series can bediscarded. This means that only the re-use flag needs to be transmittedand/or stored, resulting in significant reduction of the amount of data.

The test for whether two datawords are considered to be similar can bebased on: comparison of the second dataword's values to a predefinedthreshold derived from the noise floor; determining whether a non-zerovalue in the second dataword returns to zero in an immediatelysucceeding dataword in the series; and determining whether fewer than apredetermined number of values change from zero to non-zero or viceversa between the datawords being compared.

If spectral data sets in the series are not similar, but the onlydifference is that a signal value which was non-zero at a spectral pointin the first data set has become zero at the corresponding point in thesecond data set, it is possible again to set the re-use flag and discardthe first dataword for the second data set, but the signal value for thecorresponding point in the second dataword of the second data set is setto zero. Thus, whereas normally only non-zero values are included in thesecond dataword, in this situation a zero value is included, but thisallows the whole first dataword to be discarded.

Preferably, the noise floor is set by generating a histogram from theamplitudes in the spectral data set and fitting a threshold where thegradient of that histogram approaches zero.

Spectral signals, that is to say signals which record the amount ofenergy in a plurality of frequency bands are typically peaky and thetechnique is particularly adapted to encoding well the information inthe peaks, while discarding the lower level signal. Such a signal can beproduced by fast fourier transform of an original sensor signal, forexample a vibration signal, e.g. from a mechanical system such as anengine.

The invention extends to a data compression apparatus which executes themethod, to a computer program which can execute the method on aprogrammed computer and may be tangibly embodied on a data storagemedium.

The invention also extends to an airborne engine monitoring systemcomprising a data processing apparatus adapted to compress at least oneof engine vibration data and performance data in accordance with themethod.

The invention will be further described by way of example with referenceto the accompanying drawings in which:

FIG. 1A is a flow diagram explaining the data compression methodaccording to one embodiment of the invention;

FIG. 1B schematically illustrates the compression of data according tothe process of FIG. 1A;

FIG. 2 is a flow diagram illustrating the application of the processorFIG. 1A to a series of finite data signals;

FIG. 3 is a flow diagram explaining the reconstruction of the signalfrom the compressed data according to one embodiment of the invention;

FIG. 4 illustrates schematically non-linear quantisation in oneembodiment of the invention; and

FIG. 5 illustrates schematically the merging of spectral data sets ofdifferent sampling rates.

An embodiment of the invention will now be described which was developedin particular for the compression of vibration data from a jet engine.In this compression of data sets down to 5% of their original size wasachieved.

As illustrated in FIGS. 1A and 1B, in this example data was produced instep 100 from vibration sensors in a jet engine, the original data beingin the time domain. With this type of data significant informationresides in the frequency domain, and in particular within peaks in thesignal spectrum which often relate to particular rotating elements inthe engine. The data is therefore subjected to a fast fourier transformin step 102 to produce spectral data, i.e. successive data sets eachindicating for a particular time period the distribution of energy indifferent frequency bands. FIG. 1B illustrates three successive suchdata sets.

Such data is typically acquired from a variety of different sensors andalso in some cases at several different sampling rates from the samesensor. Typically the signal is sampled into a fixed number (e.g. 410)frequency bins, which thus have to cover the whole frequency range forthat sampling rate. Because the maximum frequency that can berepresented is half the sampling rate, the frequency range for highsampling rates is larger, so each of the fixed number of bins has tocover a larger frequency range itself, and thus the resolution of thatspectral data is lower. The different sampling rate spectral data for asingle sensor thus cover a variety of overlapping frequency ranges at avariety of resolutions: all will represent the lower frequency end ofthe spectrum, but successively fewer of them will represent the higherfrequencies. Advantage can be taken of this by keeping for any givenfrequency range only the spectral data with the maximum resolution(minimum sampling rate) for that range. Thus is illustratedschematically in FIG. 5. FIGS. 5( a) to 5(d) illustrate four spectraldata sets, each representing the same sensor signal but at differentsampling rates. Each data set has 410 frequency bins (the x-axis) and soin FIG. 5( a), where the sampling rate is 10 kHz, the 410 bins cover therange from 0 to 5 kHz, and so each bin covers a 12 Hz range. In FIG. 5(b) the sampling rate is 5 kHz, so the frequency range is 0 to 2.5 kHzand each bin covers a range of 6 Hz. Thus it covers the lower half ofthe spectrum of FIG. 5( a), but at twice the frequency resolution. FIGS.5( c) and (d) cover successively lower halves of the spectrum each timeat double the resolution.

In step 103 advantage is taken of the overlap of the frequency ranges bymerging the different sampling rate data sets (for the same time periodof the same sensor) as shown in FIG. 5( e) to keep only the highestresolution data for each frequency range. Thus 0 to 625 Hz isrepresented by the 410 bins of FIG. 5( d) where each bin is 1.5 Hz wide,625 to 1.25 kHz is represented by the top 205 bins of FIG. 5( c) whereeach bin is 3 Hz wide, 1.25 to 2.5 kHz is represented by the top 205bins of FIG. 5( b) where each bin is 6 Hz wide, and 2.5 to 5 kHz isrepresented by the top 205 bins of FIG. 5( a) where each bin is 12 Hzwide. This represents a useful compression of the data because half ofthe data from the higher sampling rates is being discarded.

In step 104 the noise floor in each data set is identified and removedby thresholding. This results, for each data set, in the retention ofonly peaks in the spectral data as schematically illustrated in FIG. 1B.In the example application of the technique to vibration data, theremoval of low level noise, which constitutes irrelevant data notassociated with any particular vibration characteristic, considerablyreduces the amount of data to be processed, stored and transmitted.

The noise floor can be estimated by fitting (e.g. by maximum likelihoodsestimation MLE) an exponential curve to the low amplitude section of ahistogram of squared FFT magnitudes. We then take the noise floor at thepoint that the gradient of this curve approximates to zero. Other waysof thresholding to remove noise can be used. Different noise floorthresholds can be used for different parts of the frequency space (whichadvantageously allows for the fact that at higher frequencies theamplitude of interesting data tends to be lower—thus potentially fallingbeneath a noise floor suitable for lower frequencies).

In the particular vibration monitoring example mentioned above, thevalues in the spectral data are encoded as four byte floating pointnumbers but in this embodiment the amount of data is further reduced bynon-linear quantisation as illustrated in step 108. In this embodimentthe data is re-quantised to eight bit data using a look-up tablecontaining a non-linear sampling of 255 values between the noise floorand the maximum amplitude in the spectral data (QMax). The floatingpoint representation is then replaced by an index into this table, foundby a binary search as giving the minimum quantisation error (thedifference between the original signal amplitude and the quanta). FIG. 4illustrates this schematically.step=(QMax/Noise Floor)^(1/NBits)quanta_(i)=min+step^(i)

As illustrated in FIG. 1B these steps mean that the resulting data setseach consist of non-zero values a′ through m′ encoding the peaks in theoriginal signal. Of course, although FIG. 1B illustrates graphical plotsof the data, in practice the data consists of one signal value for eachof the plurality of frequency “bins” (corresponding to the abscissa).

The elimination of signal values below the noise floor of step 104corresponds, therefore, to the exclusion of any frequency bins whosedata value is below the noise floor. However, in order for thedecompressed data to be useful it is necessary to be able to reconstructthe data into its original format. This is achieved in this embodimentby encoding the data set as a first dataword 200, a “locationsdataword”, which forms a bitmap of those locations (bins) which containnon-zero values. This dataword preferably has one bit per frequency binof the FFT. Thus in FIG. 1B a zero in the locations dataword 200indicates that this frequency bin has zero value (at or below the noisefloor), whereas a one indicates that there is a non-zero value. The datavalues themselves are concatenated to form a second “values” dataword202. It can be seen, therefore, that the first three data values a′, b′,c′ in the second dataword 202 correspond to the first peak in theexample data, and the position of this peak is represented by the firstset of ones in the first dataword 200. (Note that whereas the locationsdataword 200 has one bit per frequency bin, the values dataword 202 hasto encode the value for each bin and this clearly requires more bits perbin, the number depending on the quantisation used e.g. 8 bits per bin).

As well as the two datawords 200 and 202, the value of the noise floorand maximum value (Qmax) in the spectral data set are also recorded astwo byte (16 bit) datawords 203 and 204, together with the broadbandpower of the original signal, also as a two byte dataword 206, asillustrated in step 112. The average broadband power is simply thesquare root of the sum of all the squares of the bin values in theoriginal data set before noise floor removal.

Steps 104 to 112 of FIG. 1 are illustrated as applied to a single dataset, but it will be appreciated that with continuous monitoring of thesystem, such as an engine, there will be a continuous series of such FFTdata sets being produced, each successive set corresponding to asuccessive time portion of the sensor signal. In this embodiment furthercompression of the data can be achieved by comparing the locationsdatawords 200 of successive FFT data sets. As illustrated in FIG. 2, atstep 210 two successive data sets are taken and the locations datawords200 are compared. In step 212 it is checked whether or not the locationsdatawords 200 are similar. If they are similar enough (as explainedbelow) then, as illustrated in step 214, the first (locations) datawordfor the second of the data sets is discarded and instead a one bit“re-use” flag 208 is set to indicate that the locations dataword fromthe preceding FFT data set can be used again. Thus the compressed datafor the second data set is the re-use locations flag 208, together withthe values dataword 202 and the noise floor and broadband power values204, 206 which are added in step 215. In the case of vibration data froman engine running in a steady state the peaks in the signal remainlargely stable and thus this technique allows a significant reduction inthe amount of data for transmission and storage.

Of course in some situations, the locations datawords compared in step212 will not be the same, but they may be similar. In essence, there aretwo possibilities, the first is that there are new locations which havenon-zero data and the second is that locations which formerly hadnon-zero data now have a zero value. These two situations aredistinguished in step 216. In the first situation, where there are newlocations with non-zero value that are not excluded by the same criteriafor similarity as in step 212 then, as indicated in step 218, a newlocations dataword 200 and new FFT value dataword 202 must be used,together with the noise floor, QMax and broadband power values added instep 219. However, in the second case, where certain locations whichformerly had non-zero values now have zero value, then as indicated instep 220, the re-use flag 208 is set and the locations dataword isdiscarded, but values in the second (values) dataword 202 whichcorrespond to locations that now have zero value are set to zero in step222, and the noise floor, QMax and broadband power values are added asin step 224.

It will be appreciated that in step 220 the inclusion of zero signalvalues increase the size of the second dataword 202 compared to theideal (in which it encodes only non-zero signal values), but it avoidsthe need to send a new locations dataword. In fact, when considering aseries of more than two data sets there is a point at which the sendingof zero signal values in the second datawords, together with the re-uselocations flag, costs more than sending a new locations dataword. Whilstthe number of locations needed to switch below the noise floor toachieve this is so high that it is unlikely to happen without othercriteria necessitating a new locations dataword; a new locationsdataword will be sent in this instance.

To decide whether the locations datawords are similar (in step 212) andwhether non-zero data should be excluded in step 217 a series of testsare adopted to check whether the change is significant. For example, foreach of the frequency bins whose value has changed, the system cancheck:

1) Are the new signal values more than a set threshold (absolute andrelative) from the noise floor?

2) Do the frequency bins return to zero in the next data set?

3) Do more than a maximum number of frequency bins cross the noisefloor?

These checks avoid the problem that from one data set to the next acertain number of frequency bins may change from non-zero to zero orvice versa by changing from being just above or just below the noisefloor. The change in signal value may be quite small and actuallyinsignificant, but nevertheless occur in a sufficient number oflocations to cause the embodiment above to send a completely newlocations dataword 200.

Considering these three criteria in more detail:

1) If a value in a frequency bin is only just above the noise floor thenit is importance may be negligible. A comparison of the absolutedifference and/or the percentage difference between the bin and noisefloor is used to decide whether the bin is significant. In thisembodiment the bin must be above a multiplication of the noise floor bya scaling of √2.

2) If a signal value in a first data set under consideration is belowthe noise floor, then in the next data set is above the noise floor, andin a third is below the noise floor, this can be regarded as aninsignificant change and ignored.

3) If more than a certain number of frequency bins (e.g. 10, 20 or 50)cross the noise threshold to include non-zero signal values in the newdata set then it should be assumed that some interesting event hasoccurred in the system being monitored and so new locations data wordsare sent.

FIG. 3 illustrates schematically how the data is reconstructed from thecompressed data. As illustrated in step 300 the locations data word 200(or the re-use flag 208), the values data word 202 and the noise floor,QMax and broadband power values 204, 206 are taken. The first step, instep 302, is to generate noise to represent the noise in the originalFFT. This is achieved by adding in logarithmically scaled random numbersbetween zero and the noise floor.

Then in step 304 values are fitted to the non-zero signal values in thepositions defined by the locations data word (or reusing the previouslocations data word if the re-use flag is set). The quantisation look-uptable is reconstructed from the noise floor and QMax as it was tocompress. The value from the values dataword is simply an index intothis look-up table, from which a quanta can be retrieved thatapproximates to the original signal amplitude.

SPECIFIC EXAMPLE

The following table contains the compression rates of vibration spectrafrom 18 runs of a gas-turbine engine. Some of the short sets are fromground running, the longer runs are in-flight data. The ground runs tendto compress even more than the expected 20:1 ratio due to relativelystatic conditions. This shows that for assets that operate at steadystate speeds for long periods, such a power generation turbines, theinvention provides benefit from even greater compression rates.

Compressed Decompressed Compression Size(MB) Size(MB) Ratio X:1 46.8 90019.23 0.176 5.46 31.02 31.4 602 19.17 70.6 1,310 18.56 34.1 648 19.00 711,320 18.59 31.1 633 20.35 72.1 1,340 18.59 0.348 10.5 30.17 2.41 46.619.34 22.4 432 19.29 1.76 36.1 20.51 63.7 1,190 18.68 23.7 452 19.073.54 72.2 20.40 64.3 1,190 18.51 2.93 56.7 19.35 0.875 20.5 23.43

Although the example above is with specific reference to vibration datafrom a jet engine monitoring system, it will be appreciated that thetechniques are applicable to all other types of data where a spectralrepresentation is useful, for example acoustic emissions, othervibration data, pressures and strain gauges, especially in the field ofequipment monitoring.

The invention claimed is:
 1. A method of compressing spectral dataconstituting a representation of a signal from a sensor, comprising thesteps of: thresholding the spectral data with respect to a noise floorto leave remaining as non-zero values only values above the noise floor;encoding the spectral data as a first dataword constituting a bitmap inwhich each bit represents the presence or absence of non-zero value ateach position in the spectrum; and encoding a second dataword consistingof the non-zero values.
 2. A method according to claim 1 wherein thespectral data comprises data obtained by sampling the sensor signal at aplurality of sampling rates to provide a corresponding plurality ofspectral representations each at a different frequency resolution andeach extending to a different maximum frequency in the spectrum, andmerging the plurality of spectral representations into a single spectralrepresentation by retaining only the highest frequency resolution datafor each range of the spectrum.
 3. A method according to claim 1,wherein the non-zero signal values are quantised to further compresssignal.
 4. A method according to claim 3 wherein said quantisation isnon-linear quantisation.
 5. A method according to claim 4 wherein thenon-linear quantisation is set to keep the quantisation errorproportional to the original signal value.
 6. A method according toclaim 5 wherein the non-linear quantisation is calculated betweenadaptive upper and lower bounds calculated from the data.
 7. A methodaccording to claim 1 wherein the second dataword comprises the non-zerosignal values concatenated together.
 8. A method according to claim 1wherein the sensor signal is continually sampled and a series of sets ofspectral data are produced to represent the sensor signal, each setbeing compressed in accordance with any one of the preceding claims, themethod further comprising comparing respective first datawords of twosuccessive ones of said sets and, if the compared datawords are the sameor similar, setting a re-use flag and discarding the first dataword ofthe second set of spectral data.
 9. A method according to claim 8further comprising the step of comparing respective first datawords oftwo of said spectral data sets in said series and if the differencebetween them is that a signal value that was non-zero at a point in saidfirst spectral data set has become zero at the corresponding point insaid second spectral data set, setting the re-use flag, discarding thefirst dataword of the second spectral data set, and setting the signalvalue for said corresponding point in said second dataword to zero. 10.A method according to claim 8 where two datawords are considered to besimilar based on: comparison of the second dataword's values to apredefined threshold derived from the noise floor; determining whether anon-zero value in the second dataword returns to zero in an immediatelysucceeding dataword in the series; and determining whether fewer than apredetermined number of values change from zero to non-zero or viceversa between the datawords being compared.
 11. A method according toclaim 1 wherein the spectral data is a Fast Fourier Transform of thesensor signal.
 12. A method according to claim 1 wherein the sensorsignal is a vibration signal, for example from a mechanical system suchas rotating machine.
 13. A method according to claim 1 wherein thesensor signal is a Fast Fourier Transform of a vibration signal from agas turbine engine.
 14. A data compression apparatus adapted to executethe method of claim
 1. 15. A computer program comprising program codemeans for executing on a programmed computer the method of claim
 1. 16.An airborne engine monitoring system comprising a data processingapparatus adapted to compress at least one of engine vibration data andperformance data in accordance with the method of claim 1.